HOME is where I want to be
But I guess I’m already there
I come home, a clean directory
I guess this must be the place


Learning objectives


Code & Data Organization

The scientific process is naturally incremental, and many projects start life as random notes, some code, then a manuscript, and eventually everything is a bit mixed together.

“Your greatest enemy is yourself four months ago” -Every grad student ever

A good project layout will ultimately make your life easier:

Best practices for project organization

Although there is no “best” way to lay out a project, there are some general principles to adhere to that will make project management easier:

Treat raw data as read only

This is probably the most important goal of setting up a project. Raw data should never be edited, because you can never be sure that you will want to keep any edit you make, and you want to have a record of any changes you make to data. Therefore, treat your raw data as “read only”, perhaps even making a raw_data directory that is never modified. If you do some data cleaning or modification, save the modified file separate from the raw data, and ideally keep all the modifying actions in a script so that you can review and revise them as needed in the future.

Treat generated output as disposable

Anything generated by your scripts should be treated as disposable: it should all be able to be regenerated from your scripts. There are lots of different ways to manage this output, and what’s best may depend on the particular kind of project. At a minimum, it’s useful to have separate directories for each of the following:

  • data: Ideally .csv files as these are flat, transparent, and universal. You may have other specialized formats as well. .rda and .rds are R-specific data files but you never need to use these.
  • code: .R files, perhaps .do files if Stata is your thing, .py files for Python, etc.
  • results: .png or .pdf files for plots; .tex or .txt files for tables
  • papers: .tex if you write in LaTeX, .doc for Word, .Rmd for RMarkdown (which we recommend and will cover tomorrow afternoon), and .pdf or .html rendered documents.

RStudio Projects

RStudio has a feature to help keep everything organized in a self-contained, reproducible package, called a “project”.

A project is a small file with a .Rproj extension, but you can think of all the files and sub-directories as belonging to that project. We recommend creating a directory and a project file for each project you work on. It should look something like this:

When you want to work on this project using R, double click on the .Rproj file, and RStudio will open it and keep everything organized for you. You can also open an existing project from RStudio by clicking "File -> Open project…


Challenge – Create a Project

Let’s create a new project in RStudio. We won’t use this project today (we will ALWAYS work in the repository we pulled from github), but it’s good practice.

  1. Click the “File” menu button, then “New Project”.
  2. Click “New Directory”.
  3. Click “Empty Project”.
  4. Call your project “test_project_lastname”
  5. Store the new directory in the R_DAVIS_2020 folder you should have already created on your desktop.
  6. Click “Create Project”.


If everything went right, RStudio should’ve flickered and you should be looking at a pretty bare RStudio instance. That’s okay. Click on the “Files” tab in the lower right pane. Your .Rproj file should be there with nothing else. You’ve got the bare bones of a new project. Let’s now create the directory structure described above, a folder for each of data, code, results, and papers. You can do this in RStudio by clicking on the “New Folder” button in the Files pane, or in your OS by navigating to the directory you just created.


Challenge – Project Organization

  • In your project directory, either using the Project tab of RStudio or your OS’ file system, create the following directories:
    • data
    • code
    • results
    • papers

The finished product should look something like this on your computer:


Finally, let’s switch back to our R_DAVIS project that we will be working in for the rest of class. Go up to the right hand corner of your screen and click the pull down tab that currently says something like “test_project_lastname.” Once you click it, you can toggle between multiple projects at once by clicking the white square/arrow icon next to the project name. This will allow you to have multiple projects open at once. If you want to switch projects completely, just click the project name (not the white square/arrow icon).

Let’s do that now. Click on the project you’ve named R_DAVIS_2020.

This lesson is adapted from the Software Carpentry: R for Reproducible Scientific Analysis Project management with RStudio materials and the Data Carpentry: R for data analysis and visualization of Ecological Data Before We Start materials. .